Canterbury
Applying Large Language Models to Characterize Public Narratives
Poole-Dayan, Elinor, Kessler, Daniel T, Chiou, Hannah, Hughes, Margaret, Lin, Emily S, Ganz, Marshall, Roy, Deb
Public Narratives (PNs) are key tools for leadership development and civic mobilization, yet their systematic analysis remains challenging due to their subjective interpretation and the high cost of expert annotation. In this work, we propose a novel computational framework that leverages large language models (LLMs) to automate the qualitative annotation of public narratives. Using a codebook we co-developed with subject-matter experts, we evaluate LLM performance against that of expert annotators. Our work reveals that LLMs can achieve near-human-expert performance, achieving an average F1 score of 0.80 across 8 narratives and 14 codes. We then extend our analysis to empirically explore how PN framework elements manifest across a larger dataset of 22 stories. Lastly, we extrapolate our analysis to a set of political speeches, establishing a novel lens in which to analyze political rhetoric in civic spaces. This study demonstrates the potential of LLM-assisted annotation for scalable narrative analysis and highlights key limitations and directions for future research in computational civic storytelling.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Africa > Cameroon > Gulf of Guinea (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (8 more...)
Cognitive Linguistic Identity Fusion Score (CLIFS): A Scalable Cognition-Informed Approach to Quantifying Identity Fusion from Text
Wright, Devin R., An, Jisun, Ahn, Yong-Yeol
Quantifying identity fusion -- the psychological merging of self with another entity or abstract target (e.g., a religious group, political party, ideology, value, brand, belief, etc.) -- is vital for understanding a wide range of group-based human behaviors. We introduce the Cognitive Linguistic Identity Fusion Score (CLIFS), a novel metric that integrates cognitive linguistics with large language models (LLMs), which builds on implicit metaphor detection. Unlike traditional pictorial and verbal scales, which require controlled surveys or direct field contact, CLIFS delivers fully automated, scalable assessments while maintaining strong alignment with the established verbal measure. In benchmarks, CLIFS outperforms both existing automated approaches and human annotation. As a proof of concept, we apply CLIFS to violence risk assessment to demonstrate that it can improve violence risk assessment by more than 240%. Building on our identification of a new NLP task and early success, we underscore the need to develop larger, more diverse datasets that encompass additional fusion-target domains and cultural backgrounds to enhance generalizability and further advance this emerging area. CLIFS models and code are public at https://github.com/DevinW-sudo/CLIFS.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > Indiana (0.04)
- (9 more...)
- Overview (0.87)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Fuck the Algorithm: Conceptual Issues in Algorithmic Bias
Algorithmic bias has been the subject of much recent controversy. To clarify what is at stake and to make progress resolving the controversy, a better understanding of the concepts involved would be helpful. The discussion here focuses on the disputed claim that algorithms themselves cannot be biased. To clarify this claim we need to know what kind of thing 'algorithms themselves' are, and to disambiguate the several meanings of 'bias' at play. This further involves showing how bias of moral import can result from statistical biases, and drawing connections to previous conceptual work about political artifacts and oppressive things. Data bias has been identified in domains like hiring, policing and medicine. Examples where algorithms themselves have been pinpointed as the locus of bias include recommender systems that influence media consumption, academic search engines that influence citation patterns, and the 2020 UK algorithmically-moderated A-level grades. Recognition that algorithms are a kind of thing that can be biased is key to making decisions about responsibility for harm, and preventing algorithmically mediated discrimination.
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > Ontario > Kingston (0.04)
- Oceania > New Zealand > South Island > Canterbury > Christchurch (0.04)
- (2 more...)
- Law > Civil Rights & Constitutional Law (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Education > Educational Setting (1.00)
- (3 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Information Management > Search (0.88)
Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs
Saeed, Muhammed, Mohamed, Elgizouli, Mohamed, Mukhtar, Raza, Shaina, Abdul-Mageed, Muhammad, Shehata, Shady
Large language models (LLMs) are widely used but raise ethical concerns due to embedded social biases. This study examines LLM biases against Arabs versus Westerners across eight domains, including women's rights, terrorism, and anti-Semitism and assesses model resistance to perpetuating these biases. To this end, we create two datasets: one to evaluate LLM bias toward Arabs versus Westerners and another to test model safety against prompts that exaggerate negative traits ("jailbreaks"). We evaluate six LLMs -- GPT-4, GPT-4o, LlaMA 3.1 (8B & 405B), Mistral 7B, and Claude 3.5 Sonnet. We find 79% of cases displaying negative biases toward Arabs, with LlaMA 3.1-405B being the most biased. Our jailbreak tests reveal GPT-4o as the most vulnerable, despite being an optimized version, followed by LlaMA 3.1-8B and Mistral 7B. All LLMs except Claude exhibit attack success rates above 87% in three categories. We also find Claude 3.5 Sonnet the safest, but it still displays biases in seven of eight categories. Despite being an optimized version of GPT4, We find GPT-4o to be more prone to biases and jailbreaks, suggesting optimization flaws. Our findings underscore the pressing need for more robust bias mitigation strategies and strengthened security measures in LLMs.
- Asia > Middle East > Oman (0.46)
- Asia > Middle East > Qatar (0.28)
- Asia > Middle East > Kuwait (0.28)
- (34 more...)
- Media (1.00)
- Law > Civil Rights & Constitutional Law (1.00)
- Law Enforcement & Public Safety > Terrorism (1.00)
- (7 more...)
Australia's spy chief warns AI will accelerate online radicalisation
The head of Australia's peak intelligence agency has warned that people like the Christchurch terrorist are being radicalised on social media, and artificial intelligence is likely to make it much worse. The director general of the Australian Security Intelligence Organisation (Asio), Mike Burgess, told a social media summit in Adelaide on Friday that social media is "both a goldmine and a cesspit" that creates communities and divides them, and the internet was "the world's most potent incubator of extremism". He said people were embracing anti-authority ideologies, conspiracy theories and diverse grievances, and while social media was not the sole driver, he said Asio considered it a "significant driver". "Social media allows extremist ideologies, conspiracies, dis- and misinformation to be shared at an unprecedented scale and speed," he said. He said radicalisation can now take days and weeks rather than months and years as it previously did, with the most likely perpetrator of a terrorist attack being a lone actor.
- Oceania > Australia (0.95)
- Oceania > New Zealand > South Island > Canterbury > Christchurch (0.05)
- Europe > France (0.05)
Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement
Sheth, Paras, Kumarage, Tharindu, Moraffah, Raha, Chadha, Aman, Liu, Huan
Content moderation faces a challenging task as social media's ability to spread hate speech contrasts with its role in promoting global connectivity. With rapidly evolving slang and hate speech, the adaptability of conventional deep learning to the fluid landscape of online dialogue remains limited. In response, causality inspired disentanglement has shown promise by segregating platform specific peculiarities from universal hate indicators. However, its dependency on available ground truth target labels for discerning these nuances faces practical hurdles with the incessant evolution of platforms and the mutable nature of hate speech. Using confidence based reweighting and contrastive regularization, this study presents HATE WATCH, a novel framework of weakly supervised causal disentanglement that circumvents the need for explicit target labeling and effectively disentangles input features into invariant representations of hate. Empirical validation across platforms two with target labels and two without positions HATE WATCH as a novel method in cross platform hate speech detection with superior performance. HATE WATCH advances scalable content moderation techniques towards developing safer online communities.
- Oceania > New Zealand > South Island > Canterbury > Christchurch (0.04)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- Media > News (0.48)
- Health & Medicine (0.47)
- Law > Civil Rights & Constitutional Law (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Evaluating Human-Language Model Interaction
Lee, Mina, Srivastava, Megha, Hardy, Amelia, Thickstun, John, Durmus, Esin, Paranjape, Ashwin, Gerard-Ursin, Ines, Li, Xiang Lisa, Ladhak, Faisal, Rong, Frieda, Wang, Rose E., Kwon, Minae, Park, Joon Sung, Cao, Hancheng, Lee, Tony, Bommasani, Rishi, Bernstein, Michael, Liang, Percy
Many real-world applications of language models (LMs), such as writing assistance and code autocomplete, involve human-LM interaction. However, most benchmarks are non-interactive in that a model produces output without human involvement. To evaluate human-LM interaction, we develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that defines the components of interactive systems and dimensions to consider when designing evaluation metrics. Compared to standard, non-interactive evaluation, HALIE captures (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality (e.g., enjoyment and ownership). We then design five tasks to cover different forms of interaction: social dialogue, question answering, crossword puzzles, summarization, and metaphor generation. With four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21 Labs' Jurassic-1), we find that better non-interactive performance does not always translate to better human-LM interaction. In particular, we highlight three cases where the results from non-interactive and interactive metrics diverge and underscore the importance of human-LM interaction for LM evaluation.
- Europe > United Kingdom > England > West Yorkshire (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
- (7 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Leisure & Entertainment > Games (1.00)
- Law Enforcement & Public Safety (1.00)
- Education (1.00)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Artificial Intelligence is no match for the human heart - The Big Issue
Nick Cave said something interesting last week. On this occasion, he was reacting to a question from a fan. Cave does this a lot on the Red Hand Files, his online repository where he answers any number and range of enquiries from devotees. This one was about artificial intelligence. There is an open access AI bot, ChatGPT, that some people have been playing with to see if it can create as well as a human. Mark, from Christchurch in New Zealand, fired in a load of Cave's lyrics, got a resulting set of lyrics and sent them to Cave asking for his reaction.
TINYCD: A (Not So) Deep Learning Model For Change Detection
Codegoni, Andrea, Lombardi, Gabriele, Ferrari, Alessandro
In this paper, we present a lightweight and effective change detection model, called TinyCD. This model has been designed to be faster and smaller than current state-of-the-art change detection models due to industrial needs. Despite being from 13 to 140 times smaller than the compared change detection models, and exposing at least a third of the computational complexity, our model outperforms the current state-of-the-art models by at least $1\%$ on both F1 score and IoU on the LEVIR-CD dataset, and more than $8\%$ on the WHU-CD dataset. To reach these results, TinyCD uses a Siamese U-Net architecture exploiting low-level features in a globally temporal and locally spatial way. In addition, it adopts a new strategy to mix features in the space-time domain both to merge the embeddings obtained from the Siamese backbones, and, coupled with an MLP block, it forms a novel space-semantic attention mechanism, the Mix and Attention Mask Block (MAMB). Source code, models and results are available here: https://github.com/AndreaCodegoni/Tiny_model_4_CD
- Oceania > New Zealand > South Island > Canterbury > Christchurch (0.14)
- South America > Ecuador (0.04)
- South America > Colombia (0.04)
Inferring Offensiveness In Images From Natural Language Supervision
Schramowski, Patrick, Kersting, Kristian
Probing or fine-tuning (large-scale) pre-trained models results in state-of-the-art performance for many NLP tasks and, more recently, even for computer vision tasks when combined with image data. Unfortunately, these approaches also entail severe risks. In particular, large image datasets automatically scraped from the web may contain derogatory terms as categories and offensive images, and may also underrepresent specific classes. Consequently, there is an urgent need to carefully document datasets and curate their content. Unfortunately, this process is tedious and error-prone. We show that pre-trained transformers themselves provide a methodology for the automated curation of large-scale vision datasets. Based on human-annotated examples and the implicit knowledge of a CLIP based model, we demonstrate that one can select relevant prompts for rating the offensiveness of an image. Deep learning models yielded many improvements in several fields. Particularly, transfer learning from models pre-trained on large-scale supervised data has become common practice in many tasks both with and without sufficient data to train deep learning models. While approaches like semisupervised sequence learning (Dai & Le, 2015) and datasets such as ImageNet (Deng et al., 2009), especially the ImageNet-ILSVRC-2012 dataset with 1.2 million images, established pre-training approaches, in the following years, the training data size increased rapidly to billions of training examples (Brown et al., 2020; Jia et al., 2021), steadily improving the capabilities of deep models.
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- Oceania > New Zealand > South Island > Canterbury > Christchurch (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)